14 research outputs found

    Using evolutionary covariance to infer protein sequence-structure relationships

    Get PDF
    During the last half century, a deep knowledge of the actions of proteins has emerged from a broad range of experimental and computational methods. This means that there are now many opportunities for understanding how the varieties of proteins affect larger scale behaviors of organisms, in terms of phenotypes and diseases. It is broadly acknowledged that sequence, structure and dynamics are the three essential components for understanding proteins. Learning about the relationships among protein sequence, structure and dynamics becomes one of the most important steps for understanding the mechanisms of proteins. Together with the rapid growth in the efficiency of computers, there has been a commensurate growth in the sizes of the public databases for proteins. The field of computational biology has undergone a paradigm shift from investigating single proteins to looking collectively at sets of related proteins and broadly across all proteins. we develop a novel approach that combines the structure knowledge from the PDB, the CATH database with sequence information from the Pfam database by using co-evolution in sequences to achieve the following goals: (a) Collection of co-evolution information on the large scale by using protein domain family data; (b) Development of novel amino acid substitution matrices based on the structural information incorporated; (c) Higher order co-evolution correlation detection. The results presented here show that important gains can come from improvements to the sequence matching. What has been done here is simple and the pair correlations in sequence have been decomposed into singlet terms, which amounts to discarding much of the correlation information itself. The gains shown here are encouraging, and we would like to develop a sequence matching method that retains the pair (or higher order) correlation information, and even higher order correlations directly, and this should be possible by developing the sequence matching separately for different domain structures. The many body correlations in particular have the potential to transform the common perceptions in biology from pairs that are not actually so very informative to higher-order interactions. Fully understanding cellular processes will require a large body of higher-order correlation information such as has been initiated here for single proteins

    Using evolutionary covariance to infer protein sequence-structure relationships

    Get PDF
    During the last half century, a deep knowledge of the actions of proteins has emerged from a broad range of experimental and computational methods. This means that there are now many opportunities for understanding how the varieties of proteins affect larger scale behaviors of organisms, in terms of phenotypes and diseases. It is broadly acknowledged that sequence, structure and dynamics are the three essential components for understanding proteins. Learning about the relationships among protein sequence, structure and dynamics becomes one of the most important steps for understanding the mechanisms of proteins. Together with the rapid growth in the efficiency of computers, there has been a commensurate growth in the sizes of the public databases for proteins. The field of computational biology has undergone a paradigm shift from investigating single proteins to looking collectively at sets of related proteins and broadly across all proteins. we develop a novel approach that combines the structure knowledge from the PDB, the CATH database with sequence information from the Pfam database by using co-evolution in sequences to achieve the following goals: (a) Collection of co-evolution information on the large scale by using protein domain family data; (b) Development of novel amino acid substitution matrices based on the structural information incorporated; (c) Higher order co-evolution correlation detection. The results presented here show that important gains can come from improvements to the sequence matching. What has been done here is simple and the pair correlations in sequence have been decomposed into singlet terms, which amounts to discarding much of the correlation information itself. The gains shown here are encouraging, and we would like to develop a sequence matching method that retains the pair (or higher order) correlation information, and even higher order correlations directly, and this should be possible by developing the sequence matching separately for different domain structures. The many body correlations in particular have the potential to transform the common perceptions in biology from pairs that are not actually so very informative to higher-order interactions. Fully understanding cellular processes will require a large body of higher-order correlation information such as has been initiated here for single proteins.</p

    SeqStruct : A New Amino Acid Similarity Matrix Based on Sequence Correlations and Structural Contacts Yields Sequence-Structure Congruence

    No full text
    Protein sequence matching does not properly account for some well-known features of protein structures: surface residues being more variable than core residues, the high packing densities in globular proteins, and does not yield good matches of sequences of many proteins known to be close structural relatives. There are now abundant protein sequences and structures to enable major improvements to sequence matching. Here, we utilize structural frameworks to mount the observed correlated sequences to identify the most important correlated parts. The rationale is that protein structures provide the important physical framework for improving sequence matching. Combining the sequence and structure data in this way leads to a simple amino acid substitution matrix that can be readily incorporated into any sequence matching. This enables the incorporation of allosteric information into sequence matching and transforms it effectively from a 1-D to a 3-D procedure. The results from testing in over 3,000 sequence matches demonstrate a 37% gain in sequence similarity and a loss of 26% of the gaps when compared with the use of BLOSUM62. And, importantly there are major gains in the specificity of sequence matching across diverse proteins. Specifically, all known cases where protein structures match but sequences do not match well are resolved.This is a preprint made available through bioRxiv: doi: 10.1101/268904.</p

    Combining Disparate Data Types: Protein Sequences and Protein Structures

    Get PDF
    With the development of high-throughput, next-generation sequencing and other advanced technologies, a large number of gene expression profiles have been produced. Many of these profiles are available from public databases [1-3]. A challenging research problem that has drawn a lot of attention in the past is to infer gene regulatory networks from the expression data. A gene regulatory network is represented by a directed graph, in which nodes represent transcription factors or mRNA with edges showing transcriptional regulatory relationships between two nodes.This article is published as Kejue Jia and Robert L. Jernigan (2015) Combining Disparate Data Types: Protein Sequences and Protein Structures. J Data Mining Genomics Proteomics 6:e117. doi: 10.4172/2153-0602.1000e117. Posted with permission.</p

    Ribosome Mechanics Informs about Mechanism

    No full text
    The essential aspects of the ribosome’s mechanism can be extracted from coarse-grained simulations, including the ratchet motion, the movement together of critical bases at the decoding center, as well as movements of the peptide tunnel lining that assist in the expulsion of the synthesized peptide. Because of its large size, coarse-graining helps to simplify and to aid in the understanding of its mechanism. Results presented here utilize coarse-grained elastic network modeling to extract the dynamics, and both RNAs and proteins are coarse-grained. We review our previous results, showing the well-known ratchet motions and the motions in the peptide tunnel and in the mRNA tunnel. The motions of the lining of the peptide tunnel appear to assist in the expulsion of the growing peptide chain, and clamps at the ends of the mRNA tunnel with three proteins, ensure that the mRNA is held tightly during decoding and essential for the helicase activity at the entrance. The entry clamp may also assist in base recognition to ensure proper selection of the incoming tRNA. The overall precision with which the ribosome operates as a machine is remarkable.This is a manuscript of an article published as Zimmermann, Michael T., Kejue Jia, and Robert L. Jernigan. "Ribosome mechanics informs about mechanism." Journal of molecular biology 428, no. 5 (2016): 802-810. doi:10.1016/j.jmb.2015.12.003. Posted with permission.</p

    The Use of Experimental Structures to Model Protein Dynamics

    No full text
    The number of solved protein structures submitted in the Protein Data Bank (PDB) has increased dramatically in recent years. For some specific proteins, this number is very high—for example, there are over 550 solved structures for HIV-1 protease, one protein that is essential for the life cycle of human immunodeficiency virus (HIV) which causes acquired immunodeficiency syndrome (AIDS) in humans. The large number of structures for the same protein and its variants include a sample of different conformational states of the protein. A rich set of structures solved experimentally for the same protein has information buried within the dataset that can explain the functional dynamics and structural mechanism of the protein. To extract the dynamics information and functional mechanism from the experimental structures, this chapter focuses on two methods—Principal Component Analysis (PCA) and Elastic Network Models (ENM). PCA is a widely used statistical dimensionality reduction technique to classify and visualize high-dimensional data. On the other hand, ENMs are well-established simple biophysical method for modeling the functionally important global motions of proteins. This chapter covers the basics of these two. Moreover, an improved ENM version that utilizes the variations found within a given set of structures for a protein is described. As a practical example, we have extracted the functional dynamics and mechanism of HIV-1 protease dimeric structure by using a set of 329 PDB structures of this protein. We have described, step by step, how to select a set of protein structures, how to extract the needed information from the PDB files for PCA, how to extract the dynamics information using PCA, how to calculate ENM modes, how to measure the congruency between the dynamics computed from the principal components (PCs) and the ENM modes, and how to compute entropies using the PCs. We provide the computer programs or references to software tools to accomplish each step and show how to use these programs and tools. We also include computer programs to generate movies based on PCs and ENM modes and describe how to visualize them.This is a chapter from Katebi A.R., Sankar K., Jia K., Jernigan R.L. (2015) The Use of Experimental Structures to Model Protein Dynamics. In: Kukol A. (eds) Molecular Modeling of Proteins. Methods in Molecular Biology (Methods and Protocols), vol 1215. Humana Press, New York, NY . doi: 10.1007/978-1-4939-1465-4_10. Posted with permission.</p
    corecore